Problem 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

(a) Plot the series and discuss the main features of the data.
require(fpp2)
data(books)
autoplot(books) +
    scale_colour_manual(values=c('darkorchid', 'darkgreen'))

hardcover<- ts(books[,2], start = 1, frequency = 7)
softbound <- ts(books[,1], start = 1, frequency = 7)
decompose(hardcover)%>%
    autoplot()+
    ggtitle('Decomposition of Hardbound Books')

decompose(softbound)%>%
    autoplot()+
    ggtitle('Decomposition of Softbound Books')

(b) Use the ses() function to forecast each series, and plot the forecasts.
hb <- ses(hardcover, h=5)
sb <- ses(softbound, h=5)

autoplot(hb, colour ='darkgreen') +
    autolayer(fitted(hb), series ='Smoothed Values') +
    labs(title ='Hardbound Book Sales Forecast',
         x = "Year",
         y = "Daily Sales of Hardbound Books")

autoplot(sb, colour ='darkorchid') +
    autolayer(fitted(sb), series ='Smoothed Values') +
    labs(title ='Softbound Book Sales Forecast',
         x = "Year",
         y = "Daily Sales of Softbound Books")

acc_soft <- accuracy(sb)
acc_hard <-accuracy(hb)
  1. Compute the RMSE values for the training data in each case.

Hardbound Root Mean Squared Error: 31.931015
Softbound Root Mean Squared Error: 33.6376868

Problem 7.6

  1. Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
holt_soft <- holt(softbound, seasonal = 'additive', h=4)
holt_hard <- holt(hardcover, seasonal = 'additive', h=4)

autoplot(softbound) +
    autolayer(holt_soft, series = 'Holt Additive', PI=FALSE) +
        labs(title ='Holt 4-Day Linear Forecast of Softbound Sales',
         x = "Year",
         y = "Daily Sales of Softbound Books")

autoplot(hardcover) +
    autolayer(holt_soft, series = 'Holt Additive', PI=FALSE) +
        labs(title ='Holt4-Day Linear Forecast of Hardbound Sales',
         x = "Year",
         y = "Daily Sales of Hardbound Books")

acc_holt_hard <-accuracy(holt_hard)
acc_holt_soft <-accuracy(holt_soft)

(b) Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

Hardbound RMSE Comparison

RMSE Exponential Smoothed Hardbound Sales: 31.931
RMSE Exponential Smoothed Softbound Sales: 33.638

RMSE Holt-Winters Hardbound Sales: 27.194
RMSE Holt-Winters Softbound Sales: 31.137

The Holt-Winter method does indeed reduce the root mean squared errors for both the softbound and hardbound book sales. This would suggest that there is in fact come level of trend acting on the book sales series.

  1. Compare the forecasts for the two series using both methods. Which do you think is best?
fc_hh<-forecast(holt_hard)
fc_sh<-forecast(holt_soft)


fc_hes<-forecast(hb, h=4)
fc_ses<-forecast(sb, h = 4)

autoplot(softbound) +
    autolayer(fc_ses, series = 'Exponential Smoothed Softbound', PI=FALSE) +
    autolayer(fc_sh, series = 'Holt Linear Softbound', PI=FALSE) +
        labs(title ='Comparing Holt-Linear and Exponentially Smoothed Softbound Sales',
         x = "Year",
         y = "Daily Sales of Softbound Books")

autoplot(hardcover) +
    autolayer(fc_hes, series = 'Exponential Smoothed Softbound', PI=FALSE) +
    autolayer(fc_hh, series = 'Holt Linear Softbound', PI=FALSE) +
        labs(title ='Comparing Holt-Linear and Exponentially Smoothed Hardbound Sales',
         x = "Year",
         y = "Daily Sales of Hardbound Books")

  1. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
pi_ses_s<-1.96*acc_soft[2]
pi_ses_h<-1.96*acc_hard[2]

pi_hl_s<-1.96*acc_holt_soft[2]
pi_hl_s<-1.96*acc_holt_hard[2]

l_interval_ses_s <- round(fc_ses$mean[1]-pi_ses_s,3)
l_interval_ses_h <- round(fc_hes$mean[1]-pi_ses_h,3)

l_interval_hl_s <- round(fc_sh$mean[1]-pi_hl_s,3)
l_interval_hl_h <- round(fc_hh$mean[1]-pi_hl_s,3)

u_interval_ses_s <- round(fc_ses$mean[1]+pi_ses_s,3)
u_interval_ses_h <- round(fc_hes$mean[1]+pi_ses_h,3)

u_interval_hl_s <- round(fc_sh$mean[1]+pi_hl_s,3)
u_interval_hl_h <- round(fc_hh$mean[1]+pi_hl_s,3)
Model Lower Model Upper Calculated Lower Calculated Upper
SES Softbound 138.867 275.352 141.18 273.04
SES Hardbound 174.78 304.34 176.975 302.145
Holt Softbound 143.913 275.021 156.167 262.766
Holt Hardbound 192.922 307.426 196.874 303.473

The intervals are not exactly the same, with the manually calculated prediction intervals. Manual intervals are a bit less consistent with the library computed interval values using the holt-linear method than they are with exponentially smoothed model version.

Problem 7.10

For this exercise use data set ukcars, the quarterly UK passenger vehicle production data from 1977Q1–2005Q1.

(a) Plot the data and describe the main features of the series.
data("ukcars")
autoplot(ukcars, colour = 'slateblue') +
    labs(title ='UKCars Data Q1 1977 - Q2 2005',
          x = 'Quarter - Year',
          y = 'Car Production')

The time series shows that the ‘ukcars’ data has a seasonal component, a trend, as well as some less structured periodicity and random-like variation.

  1. Decompose the series using STL and obtain the seasonally adjusted data.
ukcars%>%
    stl(t.window = 13, s.window = 'periodic', robust = TRUE)%>%
    autoplot()+
    ggtitle("Decomposition of UK Cars Data usign 'STL' ")

  1. Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be done in one step using stlf() with arguments etsmodel=“AAN”, damped=TRUE.)
fit <- stlf(ukcars, etsmodel='AAN', damped=TRUE)
stlf(ukcars, etsmodel='AAN', damped=TRUE)%>%
    autoplot()+
        labs(title ='UKCars Forecast using STL Damped',
          x = 'Quarter - Year',
          y = 'Car Production')

rmse_holt_d <- accuracy(fit)[2]
  1. Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data (as before but with damped=FALSE).
fit <- stlf(ukcars, etsmodel='AAN', damped=FALSE)
stlf(ukcars, etsmodel='AAN', damped=FALSE)%>%
    autoplot()+
        labs(title ='UKCars Forecast using STL Un-Damped',
          x = 'Quarter - Year',
          y = 'Car Production')

rmse_holt_u <- accuracy(fit)[2]
  1. Now use ets() to choose a seasonal model for the data.
ets(ukcars)%>%
    autoplot()+
         labs(title ='UKCars Forecast using ETS',
          x = 'Quarter - Year',
          y = 'Car Production')

fit <- ets(ukcars)
rmse_ets <- accuracy(fit)[2]
  1. Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits?

STL: 23.295002
HOLT: 23.3211329
ETS: 25.2324409

The ETS method marginally provides the best RMSE of all the methods we have explored in this exercise, but the differences were absolutely minor.

  1. Compare the forecasts from the three approaches? Which seems most reasonable?
fit1 <- ets(ukcars)
fit1 %>% forecast(h=4) %>%
  autoplot() +
    ggtitle("ETS FOrcast of UKCars Data")

forecast(fit1)
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       427.4885 394.2576 460.7195 376.6662 478.3109
## 2005 Q3       361.3329 322.2353 400.4305 301.5383 421.1275
## 2005 Q4       404.5358 360.3437 448.7280 336.9497 472.1219
## 2006 Q1       431.8154 383.0568 480.5741 357.2455 506.3854
## 2006 Q2       427.4885 374.5571 480.4200 346.5369 508.4401
## 2006 Q3       361.3329 304.5345 418.1313 274.4672 448.1986
## 2006 Q4       404.5358 344.1174 464.9542 312.1338 496.9378
## 2007 Q1       431.8154 367.9809 495.6500 334.1890 529.4419
fit2 <- stlf(ukcars, etsmodel='AAN', damped=FALSE)
fit2 %>% forecast(h=4) %>%
    autoplot() +
    ggtitle("STL Un-Damped Holt Linear (AAN) Forcast of UKCars Data")

forecast(fit2)
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       416.1915 385.7950 446.5881 369.7040 462.6791
## 2005 Q3       366.3343 328.7907 403.8780 308.9163 423.7524
## 2005 Q4       403.8032 360.2690 447.3375 337.2233 470.3831
## 2006 Q1       436.4809 387.6847 485.2771 361.8535 511.1083
## 2006 Q2       419.8568 366.3120 473.4016 337.9671 501.7465
## 2006 Q3       369.9996 312.0932 427.9060 281.4394 458.5598
## 2006 Q4       407.4685 345.5056 469.4313 312.7045 502.2325
## 2007 Q1       440.1461 374.3755 505.9167 339.5587 540.7336
fit3 <- stlf(ukcars, etsmodel='AAN', damped=TRUE)
fit3%>%forecast(h=4) %>%
    autoplot() +
    ggtitle("Holts Damped Forcast of UKCars Data")

forecast(fit3)
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       415.0289 384.4576 445.6001 368.2742 461.7836
## 2005 Q3       364.2543 326.8700 401.6386 307.0799 421.4287
## 2005 Q4       400.8059 357.6702 443.9416 334.8355 466.7762
## 2006 Q1       432.5663 384.3596 480.7731 358.8404 506.2922
## 2006 Q2       415.0250 362.2312 467.8189 334.2838 495.7663
## 2006 Q3       364.2507 307.2369 421.2646 277.0556 451.4459
## 2006 Q4       400.8026 339.8596 461.7456 307.5983 494.0069
## 2007 Q1       432.5633 367.9289 497.1976 333.7136 531.4130
autoplot(ukcars)+
    autolayer(forecast(fit1), series ="ETS", PI=FALSE)+
    autolayer(forecast(fit2), series ="Holts Linear Un-Damped", PI=FALSE)+
    autolayer(forecast(fit3), series ="Holts Linear Damped", PI=FALSE )+
    ggtitle("Forecast Overlays ")+
    xlab('Years')+
    ylab("Car Production Levels")

Looking at the RMSE of each method, plus these charts of the predictions and the data itself the question becomes how do you choose? All of the models have approximately the same RMSE, they all have almost Idendical plots projecting forward….the question is at what value do they start and move on…it seems like the best way to choose would be to select a forcast with the tightest prediction interval around the first few predictions.

Both of the Holt Linear Damped and the Holt Linear Un-Damped are about the same, and slightly better than ETS model, so I would feel comfortable using either of them in making next quarter predictions short of some crazy chang ein the global economy or automotive markets. I think given that they are close to the same, I would choose the model with the least amount of mathematical bamboozelry and go with Un-Damped as it is a simpler model.

  1. Check the residuals of your preferred model.
checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,A,N)
## Q* = 22.061, df = 4, p-value = 0.0001949
## 
## Model df: 4.   Total lags used: 8

In looking at the residuals, they are relatively normally distributed, and the line chart shows no specific pattern to them. Both of these things are good. However in looking at the lag/acf there is clearly some pattern unaccounted for in the residuals, seen in the wave-like form there. Also the Ljung-Box tests tells us that we are not looking at an independently distributed set of residuals. However, this should be expected given the nature of our data and the fact that we did not account for the seasonality in it prior to modeling.